Toward Synthesizing Expressive Mandarin Speech
نویسندگان
چکیده
Research efforts in the field of TTS have placed emphasis on the naturalness in synthesized speech to facilitate various applications in Human-Computer Interaction (HCI). The ideal synthetic speech for HCI should not only have proper pronunciations, but also convey the appropriate semantics within the context of use. “Context” refers to the textual context of the document, the identity of the interlocutors in the interactive conversation, the application scenarios, etc. For example, synthetic speech for news reports may adopt lucid and smooth characters while sports commentaries may call for a more animated character. This paper focuses on expressive text-to-speech synthesis. Expressions in speech encompass many elements. Our work focuses on emotional and stylized synthetic speech in synthesizing speech. Emotion originates from the speakers’ psychological and physical states and is realized through spectral and prosodic parameters. Style is dependent on the semantics of the spoken message and the conversation scenarios so that it can be realized with global prosodic features. Emotion and style are also interdependent. In general, emotion has relatively local effects and its acoustic parameters are more dynamic while style has relatively global effects and its acoustic parameters are more stable in the speech signals. Emotion and style thus jointly modify the acoustic features of the speech signal for more affective and effective conveyance of the underlying message. Thus a TTS system that can simulate different emotions and styles will make HCI more natural and desirable.
منابع مشابه
An Expressive Mandarin Speech Corpus
The paper introduces an expressive mandarin speech corpus, which is supported by National Hi-tech program (863) and National Science Funding of China (NSFC), for research into expressive speech information processing. The corpus contains emotional speech, dialogue speech, etc. In order to get the subtle acoustic information, the paper also presents the annotation methods with multiple perceptio...
متن کاملAn Analysis of speeches of Hussein ibn Ali (AS) in the first step toward the incident of Karbala (Departing Medina to Mecca) based on John Searle’s Speech Acts
Linguistic theories can open new doors to historical analysis. This paper seeks to analyze the speeches of Hussein ibn Ali in the first step toward the incident of Karbala which was his departure from Medina to Mecca. The Speech Acts theory which roots in Discourse Analysis focuses on the role of language. It sees speech as an act that brings about actions in this world. Searle introduces only...
متن کاملHierarchical stress modeling and generation in mandarin for expressive Text-to-Speech
Expressive speech synthesis has received increased attention in recent times. Stress (or pitch accent) is the perceptual prominence within words or utterances, which contributes to the expressivity of speech. This paper summarizes our contribution to Mandarin expressive speech synthesis. A novel hierarchical stress modeling and generation method for Mandarin is proposed and further integrated i...
متن کاملModeling the Acoustic Correlates of Dialog Act for Expressive Chinese Tts Synthesis
This paper proposed a novel approach for describing the expressivity of dialog text and modelling their acoustic correlates for expressive text-to-speech (TTS) synthesis. We applied the Dialog Acts (DAs) in describing expressivity. In particular, we set up a Wizard-of-Oz (WoZ) data collection framework to collect the tourism domain corpus and annotated the DAs. A Pitch Target model which is opt...
متن کاملProsody Conversion for Emotional Mandarin Speech Synthesis Using the Tone Nucleus Model
In this paper, tone nucleus model is employed to represent and convert F0 contour for synthesizing an emotional Mandarin speech from a neutral speech. Compared with previous prosody transforming methods, the proposed method 1) only converts the tone nucleus part of each syllable rather than the whole F0 contour to avoid the data sparseness problems; 2) builds mapping functions for well-chosen t...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
دوره شماره
صفحات -
تاریخ انتشار 2005